ON THE STABILITY AND ERGODICITY OF AN ADAPTIVE 
SCALING METROPOLIS ALGORITHM 



MATTI VIHOLA 

' Abstract. The stability and ergodicity properties of an adaptive random walk 

. Metropolis algorithm are considered. The algorithm adjusts the scale of the sym- 

O^l ' metric proposal distribution continuously based on the observed acceptance proba- 

$H ' bility. Unlike the previously proposed forms of this algorithm, the adapted scaling 

parameter is not constrained within a predefined compact interval. This makes 
. the algorithm more generally applicable and 'automatic,' with two parameters less 

\^ I to be adjusted. A strong law of large numbers is shown to hold for functionals 

PsJ . bounded on compact sets and growing at most exponentially as ||a;|| oo, assum- 

ing that the target density is smooth enough and has either compact support or 
. super-exponentially decaying tails. 

^ '■ 
» . 

^ ■ 1. Introduction 

'— Markov chain Monte Carlo (MCMC) is a general method often used to approximate 
^ ! integrals of the type 

/ := / f{x)7r{x)dx < oo 

O I where tt is a probability density function [see, e.g., am llTl l. The method is based 

on a Markov chain (X„)„>i that can be simulated in practice, and for which /„ : = 
Ylk=i fi-^k) — )■ / as n — )■ OO. Such a chain can be constructed, for example, as 
Q\ [ follows. Assume g is a zero-mean Gaussian probability density, and let Xi = xi for 

^ ' some fixed point Xi G W^. For n > 2, recursively, 

I (SI) set Yn '■= Xn-i + OWni whcTc Wn are independent random vectors distributed 

^ ' according to g, and 

■ (S2) with probability a„ := min{l, 7r(Fri)/7r(X„_i)} the proposal is accepted and 

Xn = Yn, otherwise the proposal is rejected and X„ = X„_i. 

For any positive scalar parameter 6, this symmetric random walk Metropolis algo- 



> 
I— I 

O 

o 



rithm is valid: In I almost surely as n — )■ oo [e.g. |l3|. Theorem 1]. However, the 
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efficiency of the method, that is, the speed at which /„ converges to /, is crucially 
affected by the choice of 6. For too large 6*, few proposals become accepted, and the 
chain mixes poorly. For too small 9, most of the proposals Yn become accepted, but 
the steps X„ — Xn-i are small, preventing good mixing. In fact, previous results 
indicate that the acceptance probability is closely related with the efficiency of the 
algorithm. The 'rule of thumb' is that the acceptance probability a „ sh ould be on 
the average about 0.234 although this choice is not always optimal 0, ISl, lol 21|. In 



practice, such a ^ is usually found by several trial runs, which can be laborious and 
time-consuming. 

So called adaptive MCMC algorithms have gained popularity since the seminal 



work of Haario, Saksman, and Tamminen [10|. Several other such algorithms have 
been proposed after Andrieu and Robert [2[ noticed the connection between Robbins- 
Monro stochastic approximation and adaptive MCMC [H, H, 0, [l5, 16|. The Adaptive 



Scaling Metropolis (ASM) algorithm considered in this paper optimises the scaling of 
the proposal distribution adaptively, based on the observed acceptance probability. 
Namely, in the step (91]) of the above algorithm, the constant 9 is replaced with a 
random variable 9n-i set initially to > and for n > 2 defined recursively through 

(S3) log6'„ = log6'„_i + ?7„(a„ - a*) 

where a* is the desired mean acceptance probability, for example a* = 0.234, and 
iVn)n>2 is a sequence of adaptation step sizes decaying to zero. 

A similar random walk Metropolis algorithm with adaptive scaling was actually 
proposed over a decade ago by Gilks, Roberts and Sahu [9|. Their approach differed 
from the ASM approach so that the adaptation was performed only at particular 
regeneration times, which may occur infrequently or may be difficult to identify in 
practice. The ASM algorithm presented above has been proposed earlier by several 
authors j2|, [g], [16[, perhaps with a slightly different update formula (Sl3]). The exact 
form of (33]) was used by [3i, i5i]. The crucial difference of the present paper compared 
to the earlier works is that the algorithm does not involve any additional constraints 
on 9n. 

This difference is chiefly theoretical advance, as discussed below. Therefore, no 
empirical studies of the performance of the ASM algorithm are included in the paper. 
It is nonetheless worth pointing out that since the ASM algorithm only adapts the 
scale of the proposal distribution, it is likely to be inefficient in certain situations. 
For example, if vr is high- dimensional and possesses a strong correlation structure, 
the ASM approach is likely to be suboptimal. In such a situation, one could use the 
Adaptive Metropolis (AM) algorithm |[10i] . It has also been suggested to combine the 
AM algorithm with ASM |3|, |5[; indeed, the analysis of the present paper can be used 
also in this setting (see Remarks ^\ and . 

It is not obvious that the ASM algorithm is valid, that is, J„ — )■ /. In fact, 
there are examples of continuously adapting MCMC schemes that destroy the correct 
ergodic properties [l5|. Current ergodicity results on adaptive MCMC algorithms 
assume some 'uniform' behaviour for all the possible MCMC kernels [si 0, 15|. In the 



context of the ASM algorithm, this essentially means that 9n must be constrained 
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to a predefined set [a, b] with some < a < fe < oo. Alternatively, one can use 
a general reprojection technique with a sequence of such sets [an, bn] with a„ \ 
and 6„ oo as proposed by Andrieu and Moulines [H, or stabihsation methods 
that modify the adaptation rule to ensure stable behaviour ^ . Such constraints and 
stabilisation structures are theoretically convenient, but may pose a problem for a 
practitioner. Good values for the constraint parameters may be difficult to choose 
without prior knowledge of the target distribution tt. In the worst case, the values 
are chosen inappropriately and the algorithm is rendered useless in practice. 

However, it is a common belief that many of the proposed adaptive MCMC al- 
gorithms are inherently stable and thereby do not require additional constraints or 
stabilisation structures. Indeed, there is considerable empirical evidence of the sta- 
bility of several unconstrained algorithms, including the ASM app roach. There are 
yet only few theoretical results, especially Saksman and Vihola |20| verifying the cor- 



rect ergodic properties and the stability of the Adaptive Metropolis algorithm [10 
provided the target distribution vr has super-exponentially decaying tails with regu- 
lar contours. These assumptions on vr are close to those that ensure the geometric 
ergodicity of a non-adaptive random walk Metropolis algorithm [l2[. The result in 
|20[ does not assume an upper bound, but requires an explicit lower bound for the 
adapted covariance parameter. In the context of the ASM algorithm, this is analogous 
to constraining 6n to the interval [a, oo), where a > 0. 

The main results of this paper, formulated in the next section, show that the sta- 
bility and ergodicity of the ASM algorithm can be verified under similar assumptions 
on the target distribution as in |20|, without any modifications or constraints on 
the adaptation parameter 6n G (0, oo). These are the first results that validate the 
correctness of a completely unconstrained, fully adaptive MCMC algorithm. 



2. The Main Results 

Throughout this section, suppose that the process (X„, 9n)n>i follows the ASM 
recursion (3I])^(33]) described in Section [H the proposal density q is standard Gauss- 
ian and the step sizes are defined as ?7„ := cn~'^ with some constants c > and 
7 G (1/2, 1]. Before stating the first ergodicity result, consider the following condi- 
tion on the regularity of a collection of sets. Before that, recall that a domain in 
Mf^ is a domain whose boundary is locally a graph of a function. 

Definition 1. Suppose that is a collection of sets Ai C M'^ each consisting 

of finitely many disjoint components that are closures of domains. Let ni{x) 
stand for the outer-pointing normal at x in the boundary dAi. Then, {Ai}i^j have 
uniformly continuous normals if for all e > there is a (5 > such that for any i G / 
it holds that — ni{y)\\ < e for all x,y E dAi such that ||x — y\\ < 5. 

This definition essentially states that the boundaries dAi must be regular enough 
to ensure that if one looks at dAi at a small enough scale, it will look almost like a 
plane. 
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Theorem 2. Assume n has a compact support X C M'^ and vr is continuous, bounded 
and bounded away from zero on X. Moreover, assume that the set X has a uniformly 
continuous normal (Definition CP. Then, for any < a* < 1/2 and a bounded 
function f , the strong law of large numbers holds, that is, 

(1) — y ^ fjXk) " °°> / /(x)7r(x)dx almost surely, 

n Jr. 

Proof. Theorem [2] is a special case of Theorem HT] in Section O □ 
Let us consider next target distributions vr with unbounded supports, satisfying 



the following conditions formulated in [20 



Assumption 3. The density tt is bounded, bounded away from zero on compact 
sets, differentiable, and 

X 

(2) lim sup -r^ — T-p ■ Vlog7r(a;) = — oo 

for some p > 1, where || ■ || stands for the Euclidean norm. Moreover, the contour 
normals satisfy 

, , , X Vnix) 

3 lim sup — • „^ „ < 0. 

\\x\\>r fII II V7r(a;)|| 

This assumption is very near to the conditions introduced by Jarner and Hansen 
(l2| to ensure the geometric ergodicity of a (non-adaptive) Metropolis algorithm, 
and considered by Andrieu and Moulines [l| in the context of adaptive MCMC In 
particular, assume that vr fulfils the contour regularity condition (j3]). Instead 

of ([2]), they assume a super-exponential decay on vr, 

X 

lim sup - — - ■ Vlog7r(x) = — oo 

'■^°0||x||>r IfII 

which is only slightly more general than (|2]). See p^j for examples and discussion on 
the conditions. 

Theorem 4. Suppose vr fulfils Assumption and there is a t^ > such that the 
contour sets {Lf}o<t<to where Lt := {a; G M'^ : tt^x) > t} have uniformly continuous 
normals (DefinitionU\). Assume the function f is bounded on compact sets and grows 
at most exponentially, that is, there exist constants M, ^ < oo such that \f{x)\ < 
Mmax{l, e^"^'"} for all x G W^. Then, for any < a* < 1/2, the strong law of large 
numbers ([T]) holds. 

Proof. Theorem H] is a special case of Theorem [23] in Section [51 □ 

Remark 5. For many practical target densities satisfying Assumption [3] the tail con- 
tours are (essentially) scaled copies of each other, in which case they have automat- 
ically uniformly continuous normals. This indicates that Theorem [H is practically a 
counterpart of [20I, Theorem 13] verifying the ergodicity of the Adaptive Metropolis 
algorithm. 
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Remark 6. The 'safe' values for the desired acceptance rate stipulated by Theorems [2] 
andlHare a* G (0, 1/2). The values [1/2, 1) are excluded due to technical reasons, in 
particular due to Proposition [12] establishing the lower bound for 6'„|3 It is expected 
that Theorems [2] and m would hold assuming only a* G (0,1), but this cannot be 
verified with the present technical approach. The range a* G (0,1/2) is, however, 
often sufficient in practice, as the most commonly used values for a random walk 
Metropolis algorithms are probably a* = 0.234 and a* = 0.44, and it has been 
suggested that values a* G [0.1, 0.4] should work well in most cases 0, H, 18, 19 . 



Remark 7. The results below hold for the above algorithmic setting, but allow some 
modifications. One can use a non-Gaussian proposal distribution q. In particular, 
the results hold for a heavy-tailed multivariate Student proposals. The step size 
sequence (?7n)n>2 can be selected quite freely; essentially, {rin)n>2 must only be square- 
summable. Observe, however, that a sequence with J2n Vn < prevents efficient 
adaptation, as then 9n is trivially bounded within [a, b] for some < a < 6 < oo. 

The rest of the paper is organised as follows. Section E] describes a general adaptive 
MCMC framework and a generalised version of the above described ASM algorithm 
within it. Section H] develops stability results for this process. In particular. Corollary 
[m ensures the stability of the sequence 6n with the assumptions of Theorem |21 and 
Proposition [T7] controls the growth of 6n when vr fulfils the conditions of Theorem |H 
Once the stability results are obtained, the ergodicity is verified in Section [5] using 
the results in [20| . 

3. Notation and Framework 

Consider first a general adaptive MCMC process evolving in a measurable space 
X X §, where X is the space of the 'MCMC chain {Xn)n>i and S the space of the 
adaptation parameter (S'„)„>i. The process starts at some given Xi = Xi G X and 
5*1 = si G S, and for n > 1, follows the recur sioiJl 



if Un+l < as„{Xn,Yn+l) 




(4) X„+i = 

(5) Sn+l = Sn + rin+lH{Sn, Xn,Yn+l) 

where the acceptance probability : X x X — )■ [0, 1] for each s G S, and H : 
§ X X X X — )■ Kh is an adaptation function, with Kh C M compact. The a- algebras 
J-'i C J-'2 C ■ ■ ■ are assumed to be such that the random variables Un and Yn are 
J-'„-measurable, f/„+i is independent on J^n and uniformly distributed on [0, 1] and 
Yn+i depends on J^n only via X„ and S'„. Namely, Yn are distributed by the proposal 
density so that P(F„+i G A \ J^n) = fA<lSni-^n,y)(iy- The sequence of non-negative 
step sizes ?7„ decays to zero. 



""^The Sn of Proposition [12] and 0„ are related by On = e'^" 

^The recursion of dSj) can be considered as Robbins- Monro stochastic approximation; see [H 0, 0] 
and references therein. 
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Let US consider next a generalisation of the ASM algorithm of Section [H Let S = M 
and define X C M'' as the support of n. The family of proposal densities is defined 
as qs{x,y) := qs{x - y) with 

where the template probability density q on R'^ is symmetric, and the scaling function 
: M — )■ (0, oo) is increasing and surjective. To shed light on this definition, let Y be 
distributed according to q. Then, 0(s)F is distributed according to g^. In the context 
of the particular version of the algorithm described in Section [H one has 0(s) = 
and Sn = \og9n- The acceptance probability is the Metropolis-Hastings ratiqj 



as{x,y) := a{x,y) := min 



' 7i(x) 



The adaptation function H is defined as H{s, x, y) := H{x, y) := a{x, y) — a* where 
a* is the constant desired acceptance rate, and the step sizes satisfy YlT=2Vk = oo 

and J2T=2Vl < oo. 

Define the expected acceptance rate at a; G X with parameter s G S as 

acc(a;, s) := / a{x,y)qs{x - y)dy. 
Jx 

Clearly, the adaptation rule decreases Sn whenever acc(X„, Sn) < a*, and vice versa. 
So, it is plausible that the algorithm would result in Sn s* such that acc(s*) = a*, 
where 



acc(s) := / acc(x, s)7r(x)da; 

is the expected acceptance rate over the target density vr. In this paper, however, the 
convergence of Sn is not the main concern, but the stability of it, as it turns out to 
be crucial for the validity of the ASM algorithm. 

The Metropolis transition kernel with a proposal density is given as 

(6) Ps{x,A):=1a{x) / [1 - a{x,y)]q,{x - y)dy + / a{x,y)q,{x - y)dy 

Jw^ J A 

where 1^ stands for the characteristic function of the set A. Using the kernels Ps, 
one can write @ as P(X„+i G A \ J^n) = Psni^n, A). As usual, integration of a 
function / with respect to a transition kernel is denoted as 



PJ{x) := / fiy)Psix,dy). 



Let > 1 be a function. The y-norm of a function / is defined as 

1/(^)1 



The closed ball in M'^ is written as B{x, r) := {y : \\x — y\\ < r}, and the distance 
of a point x G M'^ from the set A C M'^ is denoted as d{x, A) := inf{||x — y\\ '■ y & A}. 



Note that Yn+i may lie outside X, but (X„)„>i C X almost surely. 
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4. Stability 

This section develops stability results, starting with a simple theorem on the general 
process given in Section [3l This theorem is auxiliary for the present paper, but may 
have applications with other adaptive MCMC algorithms of similar type. 

Theorem 8. Suppose (X„, S'„)„>i follow the general recursions (jlj) and ([5]), and the 
step sizes satisfy Yl^=iV'n < c>o. 
(i) If there is a constant a < oo such that for all n > 1 

E [H{Sn, Xn,Yn+i) \ J^n] < whcncver Sn > a, 

then lim sup„_^oo 5"^ < oo a.s. 
(a) If also J^nVn = oo and there is a non- decreasing sequence of constants (a„)„>i C 
M such that 

E [II{Sn, Xn, Yn+i) \ I'n] < b whcnevcr Sn > an 
for some b < 0, then limsup^_j.oo(5'„ — a„) < a.s. 

Proof. Let Wn '■= H^Sn-i, Xn-i,Yn)l{s„_i>a} for n > 2, and define the martingale 
{Mn, J-'n)n>i by Setting Mi := 0, and M„ := X]fc=2 for > 2 with the differences 
dMn := r]n{Wn -E[Wn\ J'n^i]). Clearly, 

oo oo 

^E[dM,^| J^,^,]<4c'Y,r]l<oo 

k=2 k=2 

where c = sup^.^^^ \x\. This implies that M„ converges to an a.s. finite limit 
[e.g. yjj. Theorem 2.15]. 

Let (rfc)fc>i be the exit times of Sn from (— oo,a), defined as := mf{n > r^-i : 
Sn > a, Sn-i < a} using the conventions tq = 0, Sq < a, and inf0 = oo. Define also 
the latest exit from (— oo,a) by cr„ := supjrfc : A; > 1, < n}. Whenever Sn > a, 
one can write Sn = S^^ + {Mn - M^J + Z^^^n where 

n 

k=m+l 

by assumption. In this case, 

(7) Sn < S„^ + {Mn - M,J < max{5i, a + cr/,J + 2 sup \Mk\ < C 

k>l 

where C is a.s. finite. If 5"^ < a the claim is trivial and holds. 

Assume then (jn]). If S'„ < a„ for all n greater than some Ni{uj) < oo, the claim is 
trivial. Suppose then that Sn > ctn infinitely often. Define {Tk)k>i as the exit times 
of Sn from (—00, a„) as above. The times must be a.s. finite in this case (and 
Sn returns to (— oo,a„) infinitely often), for suppose the contrary: then the last exit 
times an are bounded by some (Xn < cr < 00, and for n > a one may write 



Sn = S^ + {Mn - M„) + Z^^n <C„ + Z^ 
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where M„ and Zn^m are defined as above, but using the random variables Wn '■= 
H{Sn-i, Xn-i,Yn)l{Sn-i>an-i}^ the raudom variable is a.s. finite as in (JTj). 
Now, Z„^n —oo a.s. as n — )• oo, so S'„ < a„ a.s. for sufficiently large n, which is a 
contradiction. 

Fix an e > and let A^o = ^0(1^;^) be such that for all > A^^o, it holds that 
crjan < e/3 and that \Mk — Moo| < e/3 a.s. for all k > an- The claim follows from the 
estimate 

Sn < S„„ + (M„ - M.J = + v.r.H{S.„-i, F.J + (M„ - M.J 

< a.„ + e/3 + |M„ - Moo| + \M^ - M.J < a, + e 
for all n>No. □ 

Remark 9. Theorem [8] generalises for an unbounded adaptation function H under 
suitable additional assumptions. For example, assuming 

limsup \T]n+iH{Sn,Xn,Yn+i)\ = and 

oo 
k=l 

hold almost surely, the proof applies with obvious changes. Moreover, the function 
H may depend additionally on Un+i (or Xn+i). 

Hereafter, consider the adaptive scaling Metropolis (ASM) algorithm described in 
Section [31 One can give simple conditions under which the result of Theorem [H] 
applies. This is due to the fact that one can write 

E [H{Sn, Xn, Yn+l) \ I'n] = aCc(X„, Sn) " «*, 

SO in light of Theorem [HI it is sufficient to find out when acc(x, s) is below or above 
a*. 

Proposition 10. Assume it is supported on a compact set X C M'^ and a* > 0. 
Then, there is b < and a G M such that 

(8) E.[H{Sn, Xn,Yn+i) \ J^n] < b whcncvcr Sn > a- 

Proof. Without loss of generality, one can assume G X. Let e > be sufficiently 
small so that j^^^ ^■^q{z)dz < a*/2, and let a be sufficiently large so that 0(s) > 

diam(X)e~^ for all s > a. Then, for all x G X, 

a{x,y)qs{x -y)dy < / [0(s)]~'^g([0(s)]~^Jd2; = / q{u)du 



< / q{u)du < 

JB(O.e) 



'B{0,e) 

That is, ([8]) holds with b = —a*/ 2 < 0, whenever s > a. □ 

Before stating the next result bounding the conditional expectation to the opposite 
direction, let us consider a condition on the tails of vr. 



ERGODICITY OF ADAPTIVE SCALING METROPOLIS 



9 



Assumption 11. There is a A > such that Lx '■= {?/ G M'^ : n{y) > A} is compact 
and TT is continuous on Lx- Moreover, the sets in the collection {Lt}o<t<x have 
uniformly continuous normals (Definition [1]). 

Proposition 12. Suppose the target density vr satisfies Assumption fill Then, for 
any a* < 1/2, there are a G M and b > such that 

(9) K[H{Sn, Xn,Yn+i) \ Tr\ > b whenever Sn < a. 

Before giving the proof of Proposition [T2l let us outline the simple intuition behind 
it. For all s small enough, the mass of Qs is essentially concentrated on a small ball 
-8(0, e). If one looks the target vr only on B{x, e), there are basically two alternatives. 
The first one is that vr is approximately constant on that small ball and acc(a;, s) ~ 1. 
The second alternative is that it decreases very rapidly to one direction, in which case 
the set {y : n^y) > vr(x)} looks like a half-space on the ball B{x, e), and acc(a;, s) > 
1/2. 

Let us start with a lemma on this 'half-space approximation.' 

Lemma 13. Suppose that the sets {Ai}i^j with Ai C M*^ have uniformly continuous 
normals (Definition U\). Then, for any e > 0, there is a 5 > such that for any 
i E I , any x E Ai and any < r < 5, there is a half-space T such that B{x, r) (IT G 
B{x,r) n Ai, and the distance d{x,T) < er. 

The claim is geometrically evident. The technical verification is given in Appendix 

El 

Proof of Proposition [TM Fix an e* G (0, 1) and let M > 1 be sufficiently large so that 

(10) _ qs{z)dz= _ q{z)dz>l-e* 

J B{0,(t>{s)M) J B{0,M) 

and for any plane P, it holds that 

(11) / qs{z)dz = I q{z)dz < e*. 

By compactness of Lx and positivity of tt one can find 5i > such that for all 
x,y E Lx with \\x — y\\ < 6i, it holds that | log7r(a;) — \ogn{y)\ < e* so that 

l-a{x,y) = e° - e'^i°W°s-(f)~i°s-(^)} < | log7r(y) - log7r(x)| < e*. 

Let ^2 > be sufficiently small to satisfy Lemma [13] with the choice e = M~^. 

Choose a small enough a G M so that 0(a)M < mm{6i,62}- Let s < a, denote 
Ts '■= (f){s)M, and write for any x E Lx 

a{x,y)qs{x -y)dy > _ a{x,y)qs{x - y)dy 

J B(x,rs)nLx 

> (1 - e*) /_ qs{x - y)dy 

J B(x,rs)r)Lx 
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since < 6i. Denote by T the half-space from Lemma [T3l such that B{x, r^) fl T C 
B{x,rs) n Lx and the distance d{x,T) < M'^r^. One obtains 



a{x,y)qs{x -y)dy > {1 - e*) /_ q^^x - y)dy 

B{x,rs)nT 



>(l-e*)/_ qs{x-y)dy- q,{x - y)dy 

lB{x,Ts)r\f J{diy,P)<M-^rs} 



1 

2' 



> -(!-€*?-€* 



where T is the half-space with the boundary plane P parallel to the boundary of T, 
and passing through x. The last inequality follows from f llOl) with the symmetry of 
qs and (ITTi) . respectively. The same estimate holds for any x G Lt with t > 0. 
To conclude, 

acc(x, s) = j a{x, y)qs{x - y)dy ^ ^ ^ ^ Q " 

for all a; G X and for any a* < 1/2 by selecting e* = e*{a*) > to be sufficiently 
small, implying iQ with 6 = (1/2 - a*)/2 > 0. □ 

As an easy corollary of the propositions above, one establishes the stability of the 
ASM process. 

Corollary 14. Assume the target density it is compactly supported, and satisfies 
Assumption [771 Then, for the ASM process (X„, S'„)„>i with any < «* < 1/2, 
there are a.s. finite Ai and A2 such that 

(12) Ai < Sn < A2 

for all n > 1. 

Proof. The conditions of Propositions [10] and [12] are satisfied, so there are constants 
—00 < ai < a2 < 00 and 6 < such that 

E[H{Sn,X„,Yn+i) \ J^n]<b whenever Sn > 02, 

E[H{Sn,Xn,Yn+i) \ I'n] > whcucver Sn < ai. 

Theorem [8] applied to —Sn and Sn guarantees that Oi < lim inf „^oo 5'„ and 
lim sup„_^oo "S"?! < CL2, respectively, from which one obtains a.s. finite Ai and A2 for 
which ([Tl holds. □ 



The rest of this section considers targets vr with an unbounded support. Under 
a suitably regular vr, it is shown that the growth of Sn can be controlled. To start 
with, consider the following properties for the scaling function (p and the template 
proposal distribution q. 

Assumption 15. The scaling function (f) is piecewise differentiable and there are 
constants h,c> and n > 1 such that 

0'(x + < cmax{l,(j)'^{x)} 
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for all a; e M and all < ^ < /i. 

Assumption [15] is not restrictive, and it clearly holds for any polynomial or expo- 
nential (p. 

Assumption 16. The template proposal density q can be written as q{z) = 
q{\\Il^^z\\) where S G M"'^'^ is a symmetric and positive definite matrix and q : 
[0, oo) — )■ (0, oo) is a bounded, decreasing and differentiable function. Moreover, 
there is an e* > and < a < 6 < oo such that for all < e < e*, the following 
bounds hold for the derivative of q 

q'{x) — 2q'{x + e) > Ci, for all a < x < b, 

POO 

/ min{0, g'(a;) - 2g'(a; + e)}da; > -026'"^"^ 
Jo 

with some constants ci, C2, C3 > 0. 

Assumption [T6] stipulates that q is elliptically symmetric and the contours of q have 
main axes proportional to the eigenvalues of S. Moreover, the decay rate of q along 
any ray is determined by q satisfying the technical bounds. Lemma [2H] in Appendix 
IB] shows that Assumption [TB] holds for Gaussian and Student distributions q. 

The following estimate for the at most polynomial growth of (f){Sn) is crucial for 
the ergodicity result obtained in Theorem [23] 

Proposition 17. Suppose it satisfies Assumptions [3] anc? [771 Suppose also that the 
scaling function (f) satisfies Assumption\TR and the template density q fulfils Assump- 
tion\Tdi Then, for the ASM process 5'„)„>i with < a* < 1/2, and for any 
(3 > 0, there is an a.s. positive Oi = 61(0;) and an a.s. finite 62 = 62(61;, /3) such 
that 

Ol < (t>{Sn) < 02^^. 

Before the proof, let us consider an estimate of acc(x, s) depending on both x and 

s. 

Lemma 18. Assume ir satisfies Assumption [3 Then, for any e > 0, there is a 
constant c = c(e) > 1 such that acc(a;, s) < e for all 0(s) > cmax{l, 

Proof. Let ri > 1 be sufficiently large so that for some 7 > it holds that • 
ll^^ll^lll < —7 and ■ Vlog7r(x) < —7 for all ||x|| > ri. Increase ri, if necessary, so 

that for any ||x|| > ri one can write L^(^) = {y : 7r(?/) > vr(a;)} = {ru : u G S'^, < 
r < g{u)} where S'^ := {u G M"^ : = 1} is the unit sphere and the function 
g : S'^ ^ (0,00) parameterises the boundary of Notice also that the contour 

normal condition implies the existence of an M > 1 such that C 5(0, M||x||) 

for all > ri. 
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Write for ||a;|| > r2 := Mvi 
acc(x,s)=/ a{x,y)qs{x - y)dy 

< qs{x- y)dy + sup qs{x - y) a{x, y)dy. 

The first term can be estimated from above by 

(ls{x — y)dy < / qs{z)dz = / q{u)du < - 

B(0,A/||a;|| + ||x||) Jb{0,{M+2)\\x\\) JB{0,r{s,x)) 2 

whenever r{s,x) := [(j){s)]~^{M + 2)||x|| < e* for some small enough e* = e*(e) > 0, 
as in the proof of Proposition [TOl 
For the latter term, notice that 

sup qs{x -y) = [(t){s)Y'^ sup q{z) < Ci[0(s)]"'^. 

The integral can be estimated by polar integration as 

f a{x, y)dy <Cd snp [ ^rf-igiog^(™)-iog^(9(«)«)^^ 

J {d{y,L^^^l)>\\x\\} liGS^* Jr>g{u)+\\x\\ 

where q is the surface measure of the sphere S'^. Since > r2, one has that 
g{u) > ri > 1, and from the gradient decay condition, one obtains that for r > g{u)+l 

' tu _ r „_i 



log7r(rM) — log7r(5'(u)M) = / - — ^ ■ V log7r(tu)dt < — 7 / dt 

Jg{u) W'tuW Jg(u) 

< --ig{uY'^[r - g{u)] 

from which 

j,rf-lglog7r(ru)-log7r(g(u)u)^^ 

r>g(M)+||a;|| 



00 



< I e-^dw sup r'^-ie-is^")" '[^"^("^ 

'0 r>g{u)+\\x\\ 



Consequently, 

a{x,y)dy 



L 



{<i{y,L^(^^^)>\\x\\} 

2 

< Cd— sup exp 

7 9>l,f>l 



id-l)\ogi~g + f)-^r''f 



< C2 



with a finite constant C2 whenever > r2. 

To sum up, there is a C3 > such that for any ||x|| > r2 and any s satisfying 

(s)>C3max{l,||a;||}>max<^ C1C2- * ^ ^" " 
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it holds that acc(a;, s) < e. For any < r2 there is a r2 < ||a;o|| < Mr2 such that 
7r(a;o) < 7i{x). Consequently, a{x,y) < a{xQ,y) for all y eM.'^ and therefore 



acc(a;, s) < qs{x - y)dy 



+ sup qs{x -y) a{xo, y)dy. 

yGM'* J\d(y,L^,,.)>\\xo\\} 



'{rf{y.i,r(a;))>l|a;oll 

Repeating the above arguments, there is a finite constant C4 such that acc(a;, s) < e 
whenever (f){s) > C4max{l, □ 

Having Lemma [18] and the lower bound from Proposition [121 the proof of Propo- 
sition [17] can be obtained by applying the growth condition on ||X„|| established in 

a. 



Proof of Proposition [77] Proposition [T2] applied with Theorem [8] for —Sn gives an 



a.s. finite Ai such that Ai < Sn- Since > is increasing, the variable Gi := 
is a.s. positive, showing the lower bound. 

To check the polynomial growth condition for (/>(5'„), it is first verified that 
grows at most polynomially. Fix an e > and let di = ^i(e) > and Oi = ai(e) G M 
be such that Oi = 0(ai), and that F{Bi) > 1-e, with Bi := {61 > ^1} = {Ai > ai}. 
Let V{x) := c^.n^^^'^^x), where the constant Cj, := [sup^ 7r(x)]"^/^ ensures that > 1. 
Proposition [20] in Appendix [B] shows that the drift inequality 

(13) PsV{x) < V{x) + b 

holds for all > ^^i > with some b = b{9i) < 00. Construct an auxiliary process 
S'^)n>i coinciding with (X„, S'„)„>i in Bi by setting (X^, S'^) = SrJ where 
the stopping times r„ are defined as 



Tn 



n, if (p{Sk) > Oi for all 1 < /c < n 

inf{l < k <n — 1 : (f){Sk+i) < Oi}, otherwise. 



Having the inequality f[T3]) . set /3' = where the constant k > 1 is from Assump- 
tion [15] and use Proposition 10 of j20[ to obtain the bound \\X'^\\ < Q^n^' for some 
a.s. finite O^. The e > was arbitrary, so one can let e — )■ and obtain an a.s. finite 
such that \\Xn\\ < Qn'^ ■ Applying Lemma [TS] one obtains that acc(X„, S^) < a*/2 
whenever 0(S'„) > Q'n^ with B' := Ci max{l, 0}. 

Fix again an e > and let 62 = ^2(e) < 00 be such that P(-B2) > 1 — e where B2 '■= 
{0' < ^2}- Construct an auxiliary process {X'^, S'^)n>i coinciding with (X„, S'„)„>i 
in B2 by stopping the process if (piSk) > 02k^ as in the construction above. Theorem 
[8] ensures that 

limsupfS*^ — S„] < 

where are defined so that 0(a„) = 62n^' . That is, S'^ < Sn + En with _E„ — 
almost surely. Consider Assumption [15] and take Nq so large that En < h for all 
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n > ^0- Then, (j){x + h) = <f){x) + h(f)'{x + C,) for some < ^ < h, and hence 
(f){x + h) < C2max{l, (j){x)'^}. For n > Nq, one has 

^{S'J < 0(a„ + En) < C2 max{l, = C2max{l, ^X^'} < 0^n^ 

for some finite 62. Summing up, there is an a.s. finite such that 

on B2- Finally, letting e — )■ 0, one can find an a.s. finite B2 such that (piSn) < 
Gsn^. □ 

Remark 19. It is possible to obtain Corollary and Proposition [T71 when using the 
ASM algorithm within some other adaptation framework. For example, ASM can 
be combined with the Adaptive Metropolis algorithm as suggested in p] and jsj . In 
particular, one could assume that there is another (J>i-measurable) parameter Sn in 
addition to Sn, so that Yn+i ~ q^^ s„i-^n, ■) with 

QsA^^y) ■= (ls,s{^ - y) '■= - y)) 

where {q'sj^gi is a suitably 'uniform' family of symmetric probability densities. If 
there is an integrable function such that < for all 5 G § then Propositions 
[TD] and [12] can be verified to hold, implying Corollary [HJ Moreover, if § is a subset of 
positive definite dxd matrices with eigenvalues bounded away from zero and infinity 
and qs{z) = det(s)~^g(||s~^2;||) with q satisfying Assumption [T6| then Proposition [T71 
can be shown to hold. 

5. Ergodicity 

In Section HI the stability or controlled growth of the ASM process was established 
under certain conditions. This section employs these results to prove strong laws of 
large numbers for the ASM process, relying on the results introduced in [20]. For this 



purpose, consider the following alternative theoretical adaptation introduced in [20 
that applies to a sequence of restriction sets Ki C K2 C ■ ■ ■ C Kn C §. 

Assume (X„, 5'„)„>i follow the adaptation framework described in Section |3l with 
(^n)n>2 defined also as in Assume Si = si E Ki and instead of ([5]) let (S'„)„>i 
follow the 'truncated' recursion 

(14) Sn+l = 0",,t+i ySn, rin+lH{Sn, Xn, i^+l)) 

where the restriction functions 0"„ : § x S — )■ S are defined as 



aJs, s') 




if s + s' e Kn 
otherwise. 



That is, CTji ensures that Sn G Kn for all n > 1. Observe that such a 'truncated 
process' can be constructed using an 'original process' Sn)n>i and (Yn, Un)n>2 
following 01]) and (15]), and so that the two processes coincide in the set nj^^jS'n G Kn}. 

Before stating an ergodicity result for this truncated chain, four technical assump- 
tions are listed, which must hold for some constants c > 1 and /3 > 0. 
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(Al) For all measurable A C X, it holds that P(X„+i G A \ J^n) = PsS^n, A) almost 
surely, and for each s G §, the transition probability P, has tt as the unique 
invariant distribution. 

(A2) For each n > 1, the following uniform drift and minorisation conditions hold 
for all s G K^, for all a; G X and all measurable A C X 

PsV{x)<\ry{x)+hnlcA^) 

Ps{x,A) > 5„lc„(x)z/,(A) 

where C„ C X is a subset (a minorisation set), : X — t- [1, oo) is a drift function 
such that sup^g(7^ V{x) < bn and Ug is a probability measure on X concentrated 
on Cn- Furthermore, the constants A„ G (0, 1) and 6„ G (0, oo) are increasing, 
Sn G (0, 1] is decreasing with respect to n and they are polynomially bounded 
so that 

max{(l - A„)"\ b„} < cn^ . 
(A3) For all n > 1 and any r G (0, 1], there is c' = c'(r) > 1 such that for all 
s, s' G Kn, 

\\Psf ~ Ps'fWvr < dn^ WfWyAs - s'\. 
(A4) The inequality |if(5'„, X„, F„_,_i)| < cn^ holds almost surely. 

Theorem 20. Assume (J^-(J^ hold and let f be a function with ||/||y7 < oo for 
some 7 G (0,1). Assume (3 < k;^-*^ min{l/2, 1 — 7} and J2T=i^'^'''^~^Vk < 00 where 
> 1 zs an independent constant. Then, 



1 " ~ n-,00 f 
(15) — y ^ f{Xk) " °°> / f{x)TT{x)dx almost 

k=i -^^ 



surely. 



Proof. This theorem is a straightforward modification of Theorem 2 in [20|. In par- 
ticular, the assumption (AH]) here is slightly simpler than assumption (A4) in [20| 
and the changes required for the proof are obvious. □ 

The following first main result considers the case of compactly supported vr. 

Theorem 21. Suppose vr has a compact support X C R'^ and vr is continuous, bounded 
and bounded away from zero on X. Moreover, assume that the set X has a uniformly 
continuous normal (Definition [1^ and the template proposal density q satisfies As- 
sumption [TR Then, for the ASM process {Xn, Sn)n>i with any < a* < 1/2 and a 
bounded function f , the strong law of large numbers holds, that is, 

(16) _^/(x,)^ / f{x)ir{x)dx 

almost surely. 

Proof. Corollary [H] ensures that for any e > 0, there are —00 < a^^^ < < 00 such 
that P(B(^)) > 1 - e, where 

:= {in > 1, a!'^ < Sn < a^'^}. 
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Set K^n^ := i^(^) := [ a^^\a2^] for all n > 1, and construct the truncated process 
{Xn\Sn^) using these restriction sets in f|T^ . Define 6^^ := 0(a^^^) > and O^^ := 
(t>{a2^) < oo. 

Let us next verify the above assumptions (A[l])-(A|4]) with some c > 1, /3 = and 
V = 1. The assumption (Ad]) holds by construction of the process and the Metropolis 
kernel. For (AI2]), take C„ := X for all n > 1, and notice that PsV{x) = 1 for all 
a; G X and s G S. By Assumption [16] one can estimate for all s G K^'^'^ and all x G X, 



> 



P5(x,v4)> / a{x,y)qs{x - y)dy 

J A 

( inf gs(x-|/)) / dy 

Vx,yex,sei^w y sup^gx7r(2;) 

> d^" f, , inf gdl^r'S-^^ll)) cMA) > SuM) 

\ |2|<diam(A) / 

with a 5 > 0, where iys{A) := ^{A) := c^f"*^ sup""''xi-(^) '^^ Ci > chosen so that 
z/(X) = 1. Assumption [T5] ensures that the derivative of is bounded on the compact 
set Kn^ . Therefore, the Frobenius norm ||0(s)S — 0(s')S|| < C2\s — s'\ with some 
finite C2(e) and Proposition l27l in Appendix [Bl implies (AjS]). Finally, it holds that 
\H{Sn,Xn,Yn+i)\ < c, implying (A]l]). 

All im-im hold and EZik-'Vk < {Y.T=ik-^Y'\Y.T=iVlY'^ < '^^ so Theo- 
rem [20] yields a strong law of large numbers for the truncated process Xn^ in case of a 
bounded function /. Since „>i coincides with the original ASM process (X„)„>i 
in B^^\ the strong law of large numbers applies for Xn{uj) with almost every cu G B^'^\ 
Since e > was arbitrary, f[T^ holds almost surely. □ 

Remark 22. Theorem [20] (Theorem 2 of ^J) is a modification of Proposition 6 in 
[l| . Theorem [51] could be obtained also using other techniques, in particular, the 
mixingale approach described in jg], [13], or the coupling technique of [isj (resulting 
in a weak law of large numbers). These other techniques do not, however, apply 
directly to Theorem [23] below, where Theorem [20] is applied in full strength. 

Finally, the second main result considers target densities tt with an unbounded 
support. 

Theorem 23. Suppose tt satisfies Assumptionsl^ and Ull and the scaling function (p 
satisfies Assumption [73 Assume also that there exist constants M, ^ < oo such that 
the function f is bounded by |/(x)| < M max{l, e^"^"} for all x G W^. Then, for the 
ASM process (X„, 5'„)„>i with any < a* < 1/2, the strong law of large numbers 
(US]) holds. 

Proof. Proposition [T7] ensures that for any /?' > there are a.s. positive 0i and 
a.s. finite 02 such that 

(17) Oi < <P{Sn) < Q2n^'. 
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Now, similarly as in the proof of Theorem [2T| for any e > 0, one can find < 9\'> < 
02^ < oo such that 

(18) P(Vn > 1 : < 5'„ < ^n'^') > 1 - e 

and construct {Xn\sii^)n>i using the restriction sets := [a^{\a^2^'^^], where 

0(af ) = e'f^ and 0(a^''"^) = ei'^n"' . 

Let V{x) := Cy7r~^/^(a;) with cy ■= sup^. 7r^/^(x). The assumptions (A[T]) and (AH]) 
hold as verified in the proof of Theorem [21] Proposition ]26] in Appendix ]B] with 
the fact det(6'S) = 6''^det(S) yields (Al2]) with (3 = df3'. Assumption [TS] ensures that 
4>'{s) < Ci0'^(s), from which |0(s) — 0(s')| < Ci{62^n'^'Y\s — s'\ < C2n'^'^'\s — s'\ for 
all s,s' G Kn\ Now, Proposition ]27] in Appendix ]B] shows (AJS]) with /3 = cs/?'. To 
conclude, the assumptions (A1T])-(A]1]) hold with constants (c, (3), where (3 = /3(e, f3') > 
can be selected to be arbitrarily small and c = c{e,(3) < oo. 

In particular, one can let P be sufficiently small to ensure that k^P < 1/3 so 
that Yl'k'=i^'^*^~^Vk < C)0 as in the proof of Theorem Ul] One can take 7 = 1/2 
and observe that V^'^(x) > C4 max{l, e'^^"^"'' ^} for some 04,05 > implying that 
sup a; \f{x)\/V'^{x) < 00. Theorem [20] ensures that the strong law of large numbers 
holds in the set (fT8]l . and a.s. by letting e — )■ 0. □ 

Remark 24. It is possible to extend Theorems |2T] and [23] to an algorithm using the 
Adaptive Metropolis algorithm within the ASM framework jsl, Hj by applying the 
observations in Remark [T^ 
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Appendix A. Half- Space Approximation 

Proof of Lemma\T^ Fix an e' > 0. By the uniform smoothness of {SAjjjg/, one can 
let 5 > be so small that Wfiily) — ni{z)\\ < e' for all i G / and y,z & dAi with 
\\y-z\\ < 26. 

Fix an i E I, an X E Ai and a r G [0,6]. If B{x, r)\ Ai = 0, one can let T be any 
half-space passing through x. Suppose for the rest of the proof that B{x, r) \ 7^ 



ERGODICITY OF ADAPTIVE SCALING METROPOLIS 



19 




Figure 1. Illustration of the half-space approximation. The set Ai is 
shown in light grey, and the cones C_ and (7+ in dark grey. 

and let y & B{x,r) (1 dAi. Consider the open cones 

C_:={y + z: ni{y) ■ z < -e'\\z\\] 
C+:={y + z: ni{y) ■ z > t'\\z\\} 

illustrated in Figure [H We shall verify that 25) n C_ C 25) n Ai and 
5(|/,25)nC+C%,25)\A,. 

Namely, let u G B{y, 25) nC_ and write u = y + z. Suppose that u ^ A^ and define 
to := inf{t G [0, 1] : y+tz ^ A^}. Let Uq := y+t^z and notice that Uq G B{y, 25)ndAi. 
Moreover, the line segment y+tz with t G [0, 1] passes through dAi at Uq and therefore 
ni{uo) ■ z > 0, since is the outer-pointing normal of Ai. On the other hand, 

z z z 

riiiuo) ■ = (ni(Mo) - n^{y)) ■ — + ni{y) ■ — 

< \\niiuo) - ni{y)\\ - e' < 0, 

which is a contradiction, implying C_ fl B{y, 25) G AiCi B{y, 25). The case with C+ 
is verified similarly. 

Let us define the half-space T := {y — 2e'rni{y) + z : z ■ ni{y) < 0}. It holds 
that B{y, 2r) fl T C B{y, 2r) fl C_ since taking y + w E B{y, 2r) fl T one has ni{y) ■ 
w < -2e'r < -e'||t(7||. On the other hand, B{y,2r) n C_ C B{y,2r) n Ai and 
B{x,r) C B{y,2r), so B{x,r) flT C B{x,r) H Ai. Clearly, d{y,T) = 2e'r, and since 
X ^ C+ one has nj(?/) ■ {x — y) < e'\\x — y\\ < e'r. To conclude, d{x,T) < 3e'r, and 
taking e' = e/3 yields the claim. □ 

Appendix B. Simultaneous Properties for Metropolis Kernels 
Let us define the following generalisation of Assumption [161 
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Assumption 25. Let Cd C M''^'^ stand for the symmetric and positive definite matri- 
ces. Suppose V C Cd and {qs}sev is a family of probability densities defined through 

(19) g.(;.):=|det(.)rig(||.-^;.||), 

where q : [0, oo) — > (0, oo) is a bounded, decreasing, and differentiable function, 
satisfying the conditions in Assumption [161 Moreover, suppose that there is a k > 
such that the eigenvalues of each s G P are bounded from below by k. 

Proposition 26. Suppose vr satisfies AssumptionlE and the family {qs}sev satisfies 
Assumption {2E with some k, > 0. Let Ps be the Metropolis transition probability 
defined in ([6]) and using the proposal density qs. Then, there exists a compact set 
C C M'^, a probability measure v on C and a constant b G [0, oo) such that for all 
s eV, X eW'' and measurable A C W^, 

(20) PsVix) < XsVix) + blcix) 

(21) P,{x,A) > 6,lc{xHA) 

where V{x) := Cv/vr~^/^(x) > 1 with cy := sup^, 7r^/^(a;) and the constants As, 5s G 
(0, 1) satisfy the bound 

{l-\sY^y 5^^ < c|det(s)|-^ 

for some constant c > 1. 



Proof. Proposition [22] is a generalisation of [20|, Proposition 18] considering Gaussian 
densities qg. We shall describe the changes that are needed in the proof of |20|, 
Proposition 18]. 

Let s G "P. For a non-negative function /, one can write by Fubini's theorem 
r Hio) r 

/ f{z + x)qs{z)dz =\det{s)\'^ / f{z + x)dzdt 

Jr'I Jo J{q{\\[s-'^z\\)>t} 

/•oo p 

= -\det{s)\-' / / fiy)dyq'iu)du 



where the substitution t = q{u) was used, and '■= {x + z : < u}. One has 

< and thus Eu D B{x,uk). The conditions in Assumption [T6] for the 

derivative q' corresponds to the estimate obtained in (2^, Lemma 17] for a Gaussian 
family, that is, q = 

These facts are enough to complete the proof of 20|, Proposition 18] to yield the 
claim. □ 

Proposition 27. Suppose the family {qs}sev satisfies Assumptionl2R with some k > 
0. Suppose, in addition, that either 

(i) V = 1 or 

(a) TT satisfies Assumption\M and V{x) := cvtt^^^'^{x) > 1 with cy := sup^ 7r^/^(x). 
Then, there are constants Ci, C2 > such that for the Metropolis transition probability 
Pg given in it holds that 

(22) ||Ps/ - Ps'fWyr < cimax{||s||, \\s'\\r'\\f\\yr\\s - s'W 
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for all s,s' eV and r G [0, 1]. The matrix norm above is the Frobenius norm defined 
as \\a\\ := A/tr(a^a). 

Proof. Consider first ([i]). From the definition of tlie Metropolis kernel ([6]), one obtains 
sup |Ps/(x) - P^'/(2;)| < 2sup / \qs{x) - qs'{x)\dx. 

X X Jx 

For ([11]), Proposition 12 of [l| shows that for any r G [0, 1] it holds that 

\\Psf-Ps'f\\vr<2\\f\\vr [ \qsix) - qAx)\dx 

so it is sufficient to consider only the total variation of the proposal distributions. 
As in l3| and one can write 



f \qs{x) - qs'{x)\dx = j j ^qsAx)dt 



dx 



where St '■= s' + t{s — s'). Let us compute 
d 



dt 



Qstix) = |det(st)| ^ -ti {st^{s - s'))qs,{x) + q{\\st^x\ 



'dt'^' 



Si ^x\ 



and 



-rWSf x\ 
dt" * ' 



-1 N T 



St is-s')st X. 



Since s — s' and ^ are symmetric and ^ positive definite, it holds that | tr (s^ ^{s — 
s')) I < tr(s7"^) maxi<j<rf |Aj| < tr(s7^)||s — where Aj are the eigenvalues of s — s' 
[see, e.g, |22|. Since the Frobenius norm is sub-multiplicative. 



\qs{x) - qs'{x)\dx 



< sup |det(3t)|-i(tr(s-^) + ||s-l' 
te[o,i] \ 

< K^'^ dK^^ + dn^^Cd sup / 

y ||n||=l,te[0,l] JO 



\x\\\q'{\\s;'x\\)\dx] \\s-s'\\ 



r'^|g'(r||sj ^u||)|dr ) ||s - s'\ 



by polar integration. Denote A = X{u,t) := and observe that since q is 

decreasing, integration by parts yields 



M 



d 



M 



q{XM) 



r%' {\r)\dr = - I r'^-^q{\r)dr - M' 
^ Jo ^ 



< 



d 



u'^ ^q{u)du 



dCq 
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for all M > 0. Since A^^ is smaller, for any = 1 and t G [0, 1], than the maximum 
eigenvalue of s and s', which is smaller than max{||s||, we obtain 

\qs{x) — qs'{x)\dx < c\ max{||s||, Hs'lD^^^^Hs — 

concluding the proof. □ 

Lemma 28. Suppose the template proposal density q is given as q{z) = cq{\\Il^^z\\) 
where c> is a constant and S C W^^'^ is a symmetric and positive definite matrix, 
and 

(i) q{x) = e~^'^l'^ , or 

(a) q{x) = (1 + x'^)^'^^'^^"' for some 7 > 0. 

That is, q is a (multivariate) Gaussian or Student distribution, respectively. Then, q 
satisfies AssumptionUR 

Proof. For Assumption [T6] is implied by [2^, Lemma 17]. 

Assume then that q has the form and fix an e > 0. By the mean value theorem, 
denoting Ci := d + 27 and a := d/2 + 7 + 1, one can write for some e' G [0, e] 

, 2 1 

q'{x) — 2q'{x + e) = CiX 



C\X 



1 2at{x + e') 



'l + (x + e)2)° (1 + + e')2)a+i 



\2 \ " 



- (l + (a; + e)2)" V \\^{x^e'f) ) 

for all X > 0, whenever e > is sufficiently small. □ 
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